Short Text Classification Using Deep Representation: A Case Study of Spanish Tweets in Coset Shared Task

نویسندگان

  • Erfaneh Gharavi
  • Kayvan Bijari
چکیده

Topic identification as a specific case of text classification is one of the primary steps toward knowledge extraction from the raw textual data. In such tasks, words are dealt with as a set of features. Due to high dimensionality and sparseness of feature vector result from traditional feature selection methods, most of the proposed text classification methods for this purpose lack performance and accuracy. In dealing with tweets which are limited in the number of words the aforementioned problems are reflected more than ever. In order to alleviate such issues, we have proposed a new topic identification method for Spanish tweets based on the deep representation of Spanish words. In the proposed method, words are represented as multi-dimensional vectors, in other words, words are replaced with their equivalent vectors which are calculated based on some transformation of raw text data. Average aggregation technique is used to transform the word vectors into tweet representation. Our model is trained based on deep vectorized representation of the tweets and an ensemble of different classifiers is used for Spanish tweet classification. The best result obtained by a fully connected multi-layer neural network with three hidden layers. The experimental results demonstrate the feasibility and scalability of the proposed method.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparative Study of Neural Models for the COSET Shared Task at IberEval 2017

This paper describes our participation in the Classification Of Spanish Election Tweets (COSET) task at IberEval 2017. During the searching process for the best classification system, we developed a comparative study over possible combinations of corpus preprocessing, text representations and classification models. After an initial models exploration, we focus our attention over specific neural...

متن کامل

IberEval 2017, COSET Task: A Basic Approach

This paper discusses the IberEval 2017 shared task on Classification Of Spanish Election Tweets (COSET) [5]. This task has the goal to analyze tweets that talk about Spanish General Election of 2015 and classify them in one of these 5 categories: political issues, policy issues, personal issues, campaign issues and other issues.

متن کامل

ELiRF-UPV at IberEval 2017: Classification Of Spanish Election Tweets (COSET)

This paper describes the participation of ELiRF-UPV team at Classification Of Spanish Election Tweets (COSET) task. We tested several approaches based on different classifiers and features representations. Our main approach is based on neural networks, concretely, Multilayer Perceptrons (MLP) with bag-of-words representation of the tweets. Our system achieved the best score on the test set of t...

متن کامل

Classification Of Spanish Election Tweets (COSET) 2017 : Classifying Tweets Using Character and Word Level Features

This paper describes the International Institute of Information Technology of Hyderabad’s submission to the task Classification Of Spanish Election Tweets (COSET) as a part of IBEREVAL-2017[1]. The task is to classify Spanish election tweets into political, policy, personal, campaign and other issues. Our system uses Support Vector Machines with radial basis function kernel to classify tweets. ...

متن کامل

Classification Of Spanish Election Tweets (COSET) with Neural Networks

Obtaining information from tweets has become a field of interest in recent years due to its power to provide information about the insights of the users when any relevant event occurs. This is useful for companies and political parties that take advantage of this information in order to plan their next actions or to know whether or not their current actions are being received well by their publ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017